Comparing Lexical Relationships Observed within Japanese Collocation Data and Japanese Word Association Norms
نویسندگان
چکیده
While large-scale corpora and various corpus query tools have long been recognized as essential language resources, the value of word association norms as language resources has been largely overlooked. This paper conducts some initial comparisons of the lexical relationships observed within Japanese collocation data extracted from a large corpus using the Japanese language version of the Sketch Engine (SkE) tool (Srdanović et al., 2008) and the relationships found within Japanese word association sets taken from the large-scale Japanese Word Association Database (JWAD) under ongoing construction by Joyce (2005, 2007). The comparison results indicate that while some relationships are common to both linguistic resources, many lexical relationships are only observed in one resource. These findings suggest that both resources are necessary in order to more adequately cover the diverse range of lexical relationships. Finally, the paper reflects briefly on the implementation of association-based word-search strategies into electronic dictionaries proposed by Zock and Bilac (2004) and Zock (2006).
منابع مشابه
Coling 2008 22 nd International Conference on Computational Linguistics
While large-scale corpora and various corpus query tools have long been recognized as essential language resources, the value of word association norms as language resources has been largely overlooked. This paper conducts some initial comparisons of the lexical relationships observed within Japanese collocation data extracted from a large corpus using the Japanese language version of the Sketc...
متن کاملCollocations as Word Co-ocurrence Restriction Data - An Application to Japanese Word Processor
Collocations, the combination of specific words are quite useful linguistic resources for NLP in general. The purpose of this paper is to show their usefulness, exemplifying an application to Kanji character decision processes for Japanese word processors. Unlike recent trials of automatic extraction, our collocations were collected manually through many years of intensive investigation of corp...
متن کاملExtracting Bilingual Collocations from Non-Aligned Parallel Corpora
This paper proposes a new method to find correspondences of uninterrupted collocations from Japanese-English bilingual corpora without sentence-to-sentence alignment. Uninterrupted collocations in English such as “once again”, “give up”, or “gross national product” handled as a single word or a compound word in Japanese, can be automatically extracted with corresponding Japanese words using wor...
متن کاملSense Classification of Verbal Polysemy based-on Bilingual Class/Class Association
[n the field of statistical analysis of natural language data, the measure of word/class association has proved to be quite useful for discovering a meaningtiff sense cluster in an arbi trary level of the thesaurus. In this paper, we apply its idea to the sense classification of Japanese verbal polysemy in case frame acquisition from Japanese-English parallel corpora. Measures of bilingual clas...
متن کاملLarge Scale Collocation Data and Their Application to Japanese Word Processor Technology
Word processors or computers used in Japan employ Japanese input method through keyboard stroke combined with Kana (phonetic) character to Kanji (ideographic, Chinese) character conversion technology. The key factor of Kana-to-Kanji conversion technology is how to raise the accuracy of the conversion through the homophone processing, since we have so many homophonic Kanjis. In this paper, we re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008